NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Researching public health datasets in the era of deep learning: a systematic literature review

https://doi.org/10.1177/14604582241307839

Obeidat, Rand; Alsmadi, Izzat; Baker, Qanita Bani; Al-Njadat, Aseel; Srinivasan, Sriram (January 2025, Health Informatics Journal)

Objective: Explore deep learning applications in predictive analytics for public health data, identify challenges and trends, and then understand the current landscape. Materials and Methods: A systematic literature review was conducted in June 2023 to search articles on public health data in the context of deep learning, published from the inception of medical and computer science databases through June 2023. The review focused on diverse datasets, abstracting applications, challenges, and advancements in deep learning. Results: 2004 articles were reviewed, identifying 14 disease categories. Observed trends include explainable-AI, patient embedding learning, and integrating different data sources and employing deep learning models in health informatics. Noted challenges were technical reproducibility and handling sensitive data. Discussion: There has been a notable surge in deep learning applications on public health data publications since 2015. Consistent deep learning applications and models continue to be applied across public health data. Despite the wide applications, a standard approach still does not exist for addressing the outstanding challenges and issues in this field. Conclusion: Guidelines are needed for applying deep learning and models in public health data to improve FAIRness, efficiency, transparency, comparability, and interoperability of research. Interdisciplinary collaboration among data scientists, public health experts, and policymakers is needed to harness the full potential of deep learning.
more » « less
Full Text Available
Improving Node Classification Accuracy of GNN through Input and Output Intervention

https://doi.org/10.1145/3610535

Chowdhury, Anjan; Srinivasan, Sriram; Mukherjee, Animesh; Bhowmick, Sanjukta; Ghosh, Kuntal (January 2024, ACM Transactions on Knowledge Discovery from Data)

Graph Neural Networks (GNNs) are a popular machine learning framework for solving various graph processing applications. This framework exploits both the graph topology and the feature vectors of the nodes. One of the important applications of GNN is in the semi-supervised node classification task. The accuracy of the node classification using GNN depends on (i) the number and (ii) the choice of the training nodes. In this article, we demonstrate that increasing the training nodes by selecting nodes from the same class that are spread out across non-contiguous subgraphs, can significantly improve the accuracy. We accomplish this by presenting a novel input intervention technique that can be used in conjunction with different GNN classification methods to increase the non-contiguous training nodes and, thereby, improve the accuracy. We also present an output intervention technique to identify misclassified nodes and relabel them with their potentially correct labels. We demonstrate on real-world networks that our proposed methods, both individually and collectively, significantly improve the accuracy in comparison to the baseline GNN algorithms. Both our methods are agnostic. Apart from the initial set of training nodes generated by the baseline GNN methods, our techniques do not need any other extra knowledge about the classes of the nodes. Thus, our methods are modular and can be used as pre-and post-processing steps with many of the currently available GNN methods to improve their accuracy.
more » « less
Full Text Available
Constant community identification in million-scale networks

https://doi.org/10.1007/s13278-022-00895-8

Chowdhury, Anjan; Srinivasan, Sriram; Bhowmick, Sanjukta; Mukherjee, Animesh; Ghosh, Kuntal (December 2022, Social Network Analysis and Mining)

Full Text Available
Learning Explainable Templated Graphical Model

Embar, Varun; Srinivasan, Sriram; Getoor, Lise (August 2022, Uncertainty in artificial intelligence)

Templated graphical models (TGMs) encode model structure using rules that capture recurring relationships between multiple random variables. While the rules in TGMs are interpretable, it is not clear how they can be used to generate explanations for the individual predictions of the model. Further, learning these rules from data comes with high computational costs: it typically requires an expensive combinatorial search over the space of rules and repeated optimization over rule weights. In this work, we propose a new structure learning algorithm, Explainable Structured Model Search (ESMS), that learns a templated graphical model and an explanation framework for its predictions. ESMS uses a novel search procedure to efficiently search the space of models and discover models that trade-off predictive accuracy and explainability. We introduce the notion of relational stability and prove that our proposed explanation framework is stable. Further, our proposed piecewise pseudolikelihood (PPLL) objective does not require re-optimizing the rule weights across models during each iteration of the search. In our empirical evaluation on three realworld datasets, we show that our proposed approach not only discovers models that are explainable, but also significantly outperforms existing state-out-the-art structure learning approaches.
more » « less
Full Text Available
A taxonomy of weight learning methods for statistical relational learning

https://doi.org/10.1007/s10994-021-06069-5

Srinivasan, Sriram; Dickens, Charles; Augustine, Eriq; Farnadi, Golnoosh; Getoor, Lise (December 2021, Machine Learning)

Abstract Statistical relational learning (SRL) frameworks are effective at defining probabilistic models over complex relational data. They often use weighted first-order logical rules where the weights of the rules govern probabilistic interactions and are usually learned from data. Existing weight learning approaches typically attempt to learn a set of weights that maximizes some function of data likelihood; however, this does not always translate to optimal performance on a desired domain metric, such as accuracy or F1 score. In this paper, we introduce a taxonomy of search-based weight learning approaches for SRL frameworks that directly optimize weights on a chosen domain performance metric. To effectively apply these search-based approaches, we introduce a novel projection, referred to as scaled space (SS), that is an accurate representation of the true weight space. We show that SS removes redundancies in the weight space and captures the semantic distance between the possible weight configurations. In order to improve the efficiency of search, we also introduce an approximation of SS which simplifies the process of sampling weight configurations. We demonstrate these approaches on two state-of-the-art SRL frameworks: Markov logic networks and probabilistic soft logic. We perform empirical evaluation on five real-world datasets and evaluate them each on two different metrics. We also compare them against four other weight learning approaches. Our experimental results show that our proposed search-based approaches outperform likelihood-based approaches and yield up to a 10% improvement across a variety of performance metrics. Further, we perform an extensive evaluation to measure the robustness of our approach to different initializations and hyperparameters. The results indicate that our approach is both accurate and robust.
more » « less
Constant community identification in million scale networks using image thresholding algorithms

https://doi.org/10.1145/3487351.3488350

Chowdhury, Anjan; Srinivasan, Sriram; Bhowmick, Sanjukta; Mukherjee, Animesh; Ghosh, Kuntal (November 2021, ASONAM '21: Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining)

Constant communities, i.e., groups of vertices that are always clustered together, independent of the community detection algorithm used, are necessary for reducing the inherent stochasticity of community detection results. Current methods for identifying constant communities require multiple runs of community detection algorithm(s). This process is extremely time consuming and not scalable to large networks. We propose a novel approach for finding the constant communities, by transforming the problem to a binary classification of edges. We apply the Otsu method from image thresholding to classify edges based on whether they are always within a community or not. Our algorithm does not require any explicit detection of communities and can thus scale to very large networks of the order of millions of vertices. Our results on real-world graphs show that our method is significantly faster and the constant communities produced have higher accuracy (as per F1 and NMI scores) than state-of-the-art baseline methods.
more » « less
Full Text Available
A comparison of statistical relational learning and graph neural networks for aggregate graph queries

https://doi.org/10.1007/s10994-021-06007-5

Embar, Varun; Srinivasan, Sriram; Getoor, Lise (June 2021, Machine Learning)

Abstract Statistical relational learning (SRL) and graph neural networks (GNNs) are two powerful approaches for learning and inference over graphs. Typically, they are evaluated in terms of simple metrics such as accuracy over individual node labels. Complexaggregate graph queries(AGQ) involving multiple nodes, edges, and labels are common in the graph mining community and are used to estimate important network properties such as social cohesion and influence. While graph mining algorithms support AGQs, they typically do not take into account uncertainty, or when they do, make simplifying assumptions and do not build full probabilistic models. In this paper, we examine the performance of SRL and GNNs on AGQs over graphs with partially observed node labels. We show that, not surprisingly, inferring the unobserved node labels as a first step and then evaluating the queries on the fully observed graph can lead to sub-optimal estimates, and that a better approach is to compute these queries as an expectation under the joint distribution. We propose a sampling framework to tractably compute the expected values of AGQs. Motivated by the analysis of subgroup cohesion in social networks, we propose a suite of AGQs that estimate the community structure in graphs. In our empirical evaluation, we show that by estimating these queries as an expectation, SRL-based approaches yield up to a 50-fold reduction in average error when compared to existing GNN-based approaches.
more » « less
BOWL: Bayesian Optimization for Weight Learning in Probabilistic Soft Logic

Srinivasan, Sriram; Farnadi, Golnoosh; Getoor, Lise (January 2020, AAAI Conference on Artificial Intelligence (AAAI))

Full Text Available
Joint Estimation of User And Publisher Credibility for Fake News Detection

Chowdhury, Rajdipa; Srinivasan, Sriram; Getoor, Lise (January 2020, International Conference on Information and Knowledge Management (CIKM))

Full Text Available
Estimating Aggregate Properties in Relational Networks with Unobserved Data

Embar, Varun; Srinivasan, Sriram; Getoor, Lise (January 2020, International Workshop on Statistical Relational AI (StarAI))

Full Text Available

« Prev Next »

Search for: All records